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Abstract 

Matrix factorizations and their extensions to tensor factorizations and decompositions 
have become prominent techniques for linear and multilinear blind source separation (BSS), 
especially multiway Independent Component Analysis (ICA), Nonnegative Matrix and Ten- 
sor Factorization (NMF/NTF), Smooth Component Analysis (SmoCA) and Sparse Compo- 
nent Analysis (SCA). Moreover, tensor decompositions have many other potential applica- 
tions beyond multilinear BSS, especially feature extraction, classification, dimensionality 
reduction and multiway clustering. In this paper, we briefly overview new and emerging 
models and approaches for tensor decompositions in applications to group and linked multi- 
way BSS/TCA, feature extraction, classification and Multiway Partial Least Squares (MPLS) 
regression problems. 

Keywords: Multilinear BSS, linked multiway BSS/ICA, tensor factorizations and de- 
compositions, constrained Tucker and CP models, Penalized Tensor Decompositions (PTD), 
feature extraction, classification, multiway PLS and CCA. 

1 Introduction 

Although the basic models for tensor (i.e., multiway array) decompositions and factoriza- 
tions such as Tucker and (Canonical-decomposition/Parafac) CP models were proposed long 
time ago, they are recently emerging as promising tools for exploratory analysis of multi- 
dimensional data in diverse applications, especially, in multiway blind source separation 
(BSS), feature extraction, classification, prediction, and multiway clustering []Tl|2l[3]H|5). 
The recent advances in neuroimage technologies (e.g., high density array EEG/MEG, fMRI, 
NIRS) have generated massive amounts of brain data exhibiting high dimensionality, mul- 
tiple modality and multiple couplings, functional connectivity. By virtue of their multiway 
nature, tensors provide a powerful tools for analysis and fusion of such massive data to- 
gether with a mathematical backbone for the discovery of underlying hidden complex data 
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structures [ED [6] ED- Constrained Matrix Factorizations, called sometimes penalized ma- 
trix decompositions (e.g., ICA, SCA, NMF, SVD/PCA, CCA) and their extensions or pe- 
nalized tensor decompositions - multidimensional constrained models (Tucker, CP, NTF, 
NTD models ©EOlZllElX with some constraints such as statistical independence, decorre- 
lation, orthogonality, sparseness, nonnegativity, and smoothness - have been recently pro- 
posed as meaningful and efficient representations of signals, images and in general natural 
multidimensional data |[TJ. From signal processing and data analysis point of view, ten- 
sor decompositions are very attractive because they take into account spatial, temporal and 
spectral information and provide links among various data and extracted factors or hidden 
(latent variables) components with physical or physiological meaning and interpretations 
[[B [Hllini In fact, tensor decompositions are emerging techniques for data fusion, di- 

mensionality reduction, pattern recognition, object detection, classification, multiway clus- 
tering, sparse representation and coding, and multilinear blind source separation (MBSS) 

muziiniiHiiBi. 

Basic notations. Tensors (i.e., multiway arrays) are denoted by underlined capital bold- 
face letters, e.g., Y e r/ix^x-x/^ or( j er f a tensor is the number of modes, also known 
as ways or dimensions (e.g., space, time, frequency, subjects, trials, classes, groups, condi- 
tions). In contrast, matrices (two-way tensors) are denoted by boldface capital letters, e.g., 
Y; vectors (one-way tensors) are denoted by boldface lowercase letters, e.g., columns of the 
matrix U by u 7 and scalars are denoted by lowercase letters, e.g., w, 7 -. 

The mode-n product Y = G x„ U of a tensor G e r^x-^x-x/jv anc j a ma trix U e R lxj " is 

a tensor Y e R J ^~ XJ - - /x/ * Jn , with elements v ; / . k , h = , g juh a >'/,./,- 

Unfolding (matricization, flattening) of a tensor Y 6 r'ix^x-x/a, m n-mode is denoted as 
Y ( „) G K. / »x(A-4-i4+i,-A') j w hich consists of arranging all possible n-mode tubes in as columns 
of a matrix. B2. Throughout this paper, standard notations and basic tensor operations are 
used AD. 

2 Basic Models for Multilinear BSS/ICA 

Most of the linear blind source separation (BSS) models can be represented as constrained 
matrix factorization problems, with suitable constraints imposed on factor matrices (or their 
columns -referred to as components) 

Y = AB r + E = ^ a 7 b J + E = Yj a 7 ° b 7 + E , (1) 

7=1 i=i 

where o denotes outer produclQ, Y = \y it ] e R /xr is a known data matrix (representing 
observations or measurements), E = [e it ] e R /xT represents errors or noise, A = [a^] = 
[ai,a 2 , e R /Xj is an unknown basis (mixing) matrix, with basis vectors a 7 - e R 7 

'The outer product of two vectors a e R 7 , b e R 7 " builds up a rank-one matrix Y = a o b = ab 7 e R 7x7 
and the outer product of three vectors: a e R 7 , b e R 7 , c € R G builds up a third-order rank-one tensor: 
Y = aoboce R IxTxQ , with entries defined as yn q = a,- b, c q . 
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and the / columns of a matrix B = [t>i , b 2 , . . . , bj] e R TxJ represent unknown components, 
latent variables or sources bj. 

Remark: We notice that we have symmetry in the factorization: For Eq dD we could 
just as easily write Y r = BA r , so the meaning of sources and mixture are often somewhat 
arbitrary. 

Our primary objective in the BSS is to estimate uniquely (neglecting unavoidable scaling 
and permutation ambiguities) factor matrices A and B, subject to various specific constraints 
imposed on the vectors bj and/or ay, such as mutual statistical independence (ICA), sparse- 
ness (SCA), smoothness (SmoCA), nonnegativity (NMF) or orthogonality (PCA/SVD), un- 
correlation, etc. 

In some applications the data matrix Y is factorized into three or more factors [1]. In the 
special case, of a singular value decomposition (SVD) of the data matrix Y e R /xT , we have 
the following factorization: 

Y = ADB T = D Xj A x 2 B = ^ d, 7 a y bj, (2) 

i 

where A e R /x/ and B e R rxr are orthogonal matrices and D is a diagonal matrix containing 
only nonnegative singular values. The SVD and its generalizations play key roles in signal 
processing and data analysis [fl5l . 

Often multiple subject, multiple task data sets can be represented by a set of data matri- 
ces Y„ and it is necessary to perform simultaneous constrained matrix factorizations|l: 

Y„=AX (n = l,2,...,JV) (3) 

or in preprocessed form with a dimensionally reduction 

Y„-Q„A n P«B£,(« = l,2,...,iV), (4) 

subject to various additional constraints (e.g., B„ = B, Vrc and their columns are mutually in- 
dependent and/or sparse). This problem is related to various models of group ICA, with suit- 
able pre-processing, dimensionality reduction and post-processing procedures [fl6l [P71 IT81 . 
In this paper, we introduce the group multiway BSS concept, which is more general and 
flexible than the group ICA, since various constraints can be imposed on factor matrices 
in different modes (i.e., not only mutual independence but also nonnegativity, sparseness, 
smoothness or orthogonality). There is neither a theoretical nor an experimental basis that 
statistical independence (ICA) is the unique right concept to extract brain sources or iso- 
late brain networks [0. In real world scenarios, latent (hidden) components (e.g., brain 
sources) have various complex properties and features. In other words, true unknown in- 
volved components are seldom all statistically independent. Therefore, if we apply only one 
single criterion like ICA, we may fail to extract all desired components with physical inter- 
pretation. We need rather to apply an ensemble of strategies by employing several suitably 

2 This form of factorizations are typical with EEG/MEG related data for multi-subjects, multi-tasks, while 
the factorization for the transposed of Y is typical for fMRI data lfl6l [T71 ITSTl . 
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chosen criteria and associated learning algorithms to extract all desired components HUID- 
For these reasons, we have developed multiway BSS methods, which are based not only 
on the statistical independence of components, but rather exploits multiple criteria or di- 
versities of factor matrices in various modes, in order to extract physiologically meaningful 
components, with specific features and statistical properties [jTllH. By diversity, we mean 
different characteristics, features or morphology of source signals or hidden latent variables 
(7]|. Since multi-array data can be always interpreted in many different ways, some a priori 
knowledge is needed to determine, which diversities, characteristics, features or properties 
represent true latent (hidden) components with physical meaning. 

It should be noted, although standard 2D BSS (constrained matrix factorizations) ap- 
proaches, such as ICA, NMF, SCA, SmoCA, PCA/SVD, and their variants, are invaluable 
tools for feature extraction and selection, dimensionality reduction, noise reduction, and data 
mining, they have only two modes or 2-way representations (typically, space and time), and 
their use is therefore limited. In many neuroscience applications the data structures often 
contain higher-order ways (modes) such as subjects, groups, trials, classes and conditions, 
together with the intrinsic dimensions of space, time, and frequency. For example, studies in 
neuroscience often involve multiple subjects (people or animals) and trials leading to exper- 
imental data structures conveniently represented by multiway arrays or blocks of multiway 
data. 

The simple linear BSS models ©-© can be naturally extended for multidimensional 
data to the multiway BSS models using constrained tensor decompositions. In this paper, 
we consider a general and flexible approach based on the Tucker decomposition model - 
called also Tucker-N model (see Fig. Q](a)) [EMU El: 

71=1 72=1 j N =l 

= Gxj U (1) x 2 U (2) --- x N U (N) +E 

= Gx{U} + E = Y + E, (5) 

where Y e R IlXlr " xlN is the given data tensor, G 6 jg/ix^-x-Z/v [ s a core tensor of reduced di- 
mension, U (n) = [iij , u!, , . . . , u " ] e R 7 "* 7 " (n = 1,2, ... ,N) are factors (component matri- 
ces) representing components, latent variables, common factors or loadings, Y is an approx- 
imation of the measurement Y, and E denotes the approximation error or noise depending 
on the context [Hl[3|2]|. The objective is to estimate factor matrices: U (n) , with components 
(vectors) u {n \ (n = 1,2,..., N, j„ = 1,2,..., J N ) and the core tensor G 6 fr^-x/jv 
assuming that the number of factors in each mode J n are known or can be estimated [QJ. 

If the factor matrices and a core tensor are orthogonal the Tucker model can be consid- 
ered as extension of the SVD model ©, known as the High Order SVD (HOSVD) ifBI 
While the optimal approximation of a matrix can be obtained by truncation of its SVD, 
the optimal tensor approximation (with a minimal norm of the error) cannot in general be 
obtained by truncation of the Tucker decomposition. However, it was shown that the trun- 
cation of the particular constrained version usually yields a quite good approximation |fl5l . 
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Figure 1 : Illustration of a 3-way tensor decomposition using a constrained Tucker-3 model; 
Objective is to estimate factor matrices \J (n) = [u^ , u£ , . . . , Uj ] e R 7 " x7 " (with desired 
diversities or statistical properties) and a possibly sparse core tensor G 6 M/ [Xj2Xj \ typically 
with J n « I„ (n =, 1,2,3). (b) Preselected factor matrices in Tucker-3 can be absorbed 
by a core tensor G, and this leads to Tucker- 1 models: Y = G (1) X! U (1) + E (1) = G (2) x 2 
U (2) + E (2) = G (3) x 3 U (3) + E (3) . Instead of applying the standard Alternating Least Squares 
(ALS) algorithms for the Tucker-3 model, we can apply unfolding of data tensor according 
to the Tucker- 1 models and then perform constrained factorizations of the unfolded matrices 
(multiway BSS) by imposing desired constraints (independence, sparseness, nonnegativity, 
smoothness or uncorrelation, etc.) 



In general, in the Tucker model the orthogonality constraints are usually not imposed. In- 
stead, we consider alternative constraints, such as sparseness, nonnegativity, smoothness or 
statistical mutual independence. It should be noted that the Tucker model is not unique in 
general, even if we impose some weak constraints. However, this model is unique if we 
impose suitable constraints (e.g., sparseness or independence). 

This leads to a concept of group multiway BSS. There are two possible interpretations 
or concepts of employing Tucker decomposition as multiway BSS. In the first concept the 
columns of factor matrices U (,l) represent desired components or latent variables and the 
core tensor represent some "mixing process" or more precisely the core tensor shows links 
among components in different modes, while data tensor Y represents collection of 1-D or 
2-D mixing signals. In the second concepts, the core tensor represents desired but unknown 
(hidden) Af-dimensional signal (e.g., 3D MRI image or 4D video) and factor matrices rep- 
resent various transformations, e.g., time frequency transformation or wavelets dictionaries 
(mixing or filtering processes), while a data tensor Y represents observed Af-dimensional 



5 



signal which is distorted, transformed compressed or mixed depending on applications. In 
this paper, we will consider only the first interpretation. 

The Tucker-Af model © can be represented by n approximative matrix factorizations 
with three factors: 

Y (n) = U (,l) G (n) Z (n) , (n =\,2,...,N), (6) 

where Z (n) = [u (A ° ® • • • ® U ( " +1) ® U (n - 1} ■ • • ® U (1) ] r . 

Moreover, the Tucker- Af model © can be compressed to N Tucker- 1 models, with only 
one factor matrix U (,,) , in each mode n (see Fig[T](b)): 

Y = G {n) X„ U (n) or in matrix form Y M = U (B) Gjj, (7) 

where G ( ' !) = Gxi U (1) x 2 • • • x„_j U (,, - 1} x„ +1 U ( " +1) • • • x w U w , (n = 1 , 2, . . . , N). Note that 
the above models correspond to group BSS/ICA models ©, with Y n = YL, A„ = [G^] r 
and B„ = U (M) , Vn, under the assumption that we impose desired constraints for factor 
matrices U ( " } . 

Most of the existing approaches exploit only the CP model and impose only statistical 
independence constraints This leads to tensor probabilistic ICA [fT9ll or ICA-CPA models 
Il2~2l 12311 . However, our approaches are quite different, because we use the Tucker models 
and we are not restricting only to ICA assumptions but exploit multiple criteria and allows 
for diversities of components. The advantage of the Tucker model over the CP model is 
that the Tucker model is more general and the number of components in each mode can be 
different and furthermore the components are linked via a core tensor, and hence allows us 
to model more complex hidden data structures. Note that the Tucker decomposition can be 
simplified to the CP model in the special case, where the "hyper-cube" core tensor (with 
J = Ji = J 2 = ■ ■ ■ = Jn) has nonzero elements only on the super-diagonal. The CP model 
has unique factorization without any constraints under some mild conditions. 

For the Tucker and the CP models there exist many efficient algorithms [Q~l |2). Most 
of them are based on the ALS (Alternating Least Squares) and HALS (Hierarchical ALS) 
HI [81 |24l [HI and CUR (Column-Row) decompositions ll25ll . However, description of these 
algorithms is out of the scope of this paper. 

We can implement Multilinear BSS algorithms in several ways. First of all, we can 
minimize a global cost function, with suitable penalty and/or regularization terms to estimate 
desired components (see Eq. ©): 

D F (Y||G, {U}) = ||Y - G x {U}\\ 2 F + J] <*nC n (U (n) ), (8) 

n 

where a„ > are penalty coefficients and C ; ,(U (n) ) are penalty terms, which are added to 
achieve specific characterstic of the components. For example, if we need to impose mu- 
tual independence constraints the penalty terms can take the following form C„(U ( ^) = 
Yjn=\ Tjj+p u ;" )r| Ap( u n !) )' where iff n (u) are suitable nonlinear functions. In principle, this 
method referred as penalized tensor decompositions, allows to find the factor matrices U (,l) , 
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with unique components u . and an associated core tensor but the method involves heavy 
computations and it is time consuming. 

Another approach is to apply standard Tucker decompositions method, without applying 
any desired constraints using ALS [□, CUR fl23 or HOOI/HOSVD O |H algorithms 
and in the next step apply for each factor matrix standard constrained matrix factorization 
algorithms (e.g., ICA, NMF or SCA) fl2TJ|26]|. Assuming that each unconstrained factor U ( " ) 
in the model §5$ can be further factorized using standard BSS algorithms as U ( " ) = B„Aj, 
we can formulate the following decomposition model: 

Y = Gxi U (1) x 2 U (2) ■■■ x N U (A ° (9) 
= G Xifii x 2 B 2 x N B N , (10) 

where G = |CX] A[ x 2 Aj • ■ • x N A^j. It is worth to note, that in each mode, we can apply 
different criteria for matrix factorization. 

Alternatively, a simpler approach is to perform the unfolding the data tensor Y for each 
mode n, according to the Tucker- 1 decompositions © and to apply directly a suitable con- 
strained factorization of matrices (or a penalized matrix decomposition)^] 

Y W =BX or Y[ n) =AX> (n = l,...,N) (11) 

subject to desired constraints, by employing standard efficient BSS algorithms (e.g., NMF 
or ICA) flU |26j. Since some matrices Y(„) may have large dimensions, we usually need to 
apply some efficient methods for dimensionality reduction Il26l . Finally, the core tensor, 
which shows the links among components in different modes can be computed as 

G = Y Xi [Bi] + x 2 [B 2 ] + • • • x* [B N ] + , (12) 

where [-] + denotes Moore-Penrose pseudo-inverse of a matrix. 

Finally, we can apply the Block Tensor Decomposition (BTD) approach [3], with suit- 
able constraints imposed on all or only in some preselected factors or components. In this 
particular case, the simplest scenario is to employ a constrained Block Oriented Decompo- 
sition (BOD) model (combining or averaging N Tucker- 1 models) 0]: 

1 N 

- = nYj (- n) Xn U(n) ) + ^ ( 1 3) 

n=l 

This model allows us to estimate all desired components in parallel or sequentially by min- 
imizing the norm of the total errors (||E|||) subject to specific constraints. 



3 Unfortunately, this approach does not guarantee best fitness of the model to the data or minimum norm 
of errors ||E||^,, but usually the solutions are close to optimal. Note that in practice, for large scale problems, 
we do not need to perform explicitly unfolding of the full data tensor. Instead, we may apply fast sampling of 
tubes (columns) of data tensors [25|. 
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3 Dimensionality Reduction, Feature Extraction and Clas- 
sification of Multiway Data 



Dimensionality reduction, feature extraction and selection are essential problems in the 
analysis of multidimensional datasets with large number of variables lfT4l . We shall first 
illustrate the basic concepts of dimensionality reduction and feature extraction for a set of 
large-scale sample matrices. Let us consider, that we have available a set of K matrices 
(2-D samples) X w eR' lXll ,(k = l,2,...,K) that represent multi subjects or multi-trials 2D 
data, which belong to C different classes or categories (e.g., multiple mental task/state data 
or different mental diseases). In order to perform dimensionality reduction and to extract 
essential features for all the training samples, we apply simultaneous (approximative and 
constrained) matrix factorizations (see Fig. 0(a)): 

X ik) =u (1) F w U (2)T + E w , (k=l,2,.-.,K), (14) 

where the two common factors (basis matrices) U (1) e R. /lX/l and U (2) e 9J 2Xj2 , J„ < I n , in 
code (explain) each sample X w simultaneously along the horizontal and vertical dimensions 
and the extracted features are represented by matrices F w e R J,xj2 , typically, with J t « I { 
and J 2 « h- In special cases F w are squared diagonal matrices. This problem can be 
considered as a generalization of Joint Approximative Diagonalization (JAD) lfTll2"7Tl. 

The common method to solve such matrix factorizations problem is to minimize a set of 
cost functions ||X W - U (1) F w U^ 7 !!^, ik sequentially or in parallel, with respect to all the 
factor matrices. We can solve the problem more efficiently by concatenation or tensorization 
of all matrices X® along the third dimension to form an I\ x I 2 x K dimensional data tensor 
X e ^JiXhxK anc [ p er f orm the standard Tucker decomposition (see Fig. 12(b)). 

In order to obtain meaningful components and a unique decomposition it is often conve- 
nient to impose additional constraints [HIO. In the special case when the feature matrices 
Fjt are diagonal the Tucker-2 model is reduced to special form of the unique CP model 0]|. 

This approach can be naturally and quite straightforwardly extended to multidimensional 
data. Assume, that we have available a set of multidimensional tensors X® e r^x^x-x/^ 
(k = 1,2,..., K), representing training data belonging to C classes or categories. Each 
training sample X (/:) is given a label indicating the category (class) to which it belongs. 

In order to preform a dimensionality reduction and to extract significant features, we 
need to apply simultaneous approximative tensor decompositions (see Fig. © 

X<*> = G w Xl U (1) x 2 U (2) • • • x N U w + E<*>, (15) 

(k = 1,2, . . .,K), where the compressed core tensors G w e r^ix^zx-xJjv representing fea- 
tures are of lower dimension than the original data tensors X (k \ and we assume that the 
factors (basis matrices) U (n) = [Uj ,ui , . . . ] 6 R 1,,xj ",(n = 1,2, ... ,N) are common 
factors for all data tensors. 

To compute the factor matrices U (,,) and the core tensors G (k \ we concatenate all training 
(sample) tensors into one N + 1 order training data tensor X = cat(X (1) , X (2) , . . . , X ( ^, Af + 
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(a) 





(b) 




Figure 2: (a) Simultaneous matrix factorizations for dimensionality reduction and feature 
extraction, (b) This problem is equivalent to a Tucker-2 decomposition of a 3-way tensor 
into a constrained core tensor F e -gJi*J2*K (representing features) and two basis factors 
U(i) e R /ix/, andU (2) 6 R / 2 x7 2 _ 



1) e R^^-x/A^ with + 1 = ^ and perform the Tucker-N decomposition Ifl4l : 

X = G XiU (1) x 2 U (2) --- x N U (N) +E, (16) 

where the sample tensors X (k) can be extracted back from the concatenated tensor by fixing 
the (N + l)-th index at a value k, i.e., X(:, :,...,:, k ) = X w and the individual features 

— 1 2 N N+l ~ 

(corresponding to different classes) are extracted from the core tensor G e R^^-x/atx/at+i as 
G (k) = G(:, k ), with 7w +1 = ^. In other words, the features of a specific training 

12 N N+l 

data X w are represented by the k-th row of the mode-(N + 1) matricized version of the core 
tensor G. 

The above procedure for the feature extraction has been applied for a wide class of clas- 
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Figure 3: Illustration of feature extraction from a set of 3-way data tensors X (k) e R IlX ' 2Xl3 , 
( k = 1,2,..., ^f). The objective is to estimate common factor matrices (bases) U (,,) e M/" XJ " 
{n = 1,2,3) and core tensors G (k) e R/i*-^*- 7 ^ typically with 7„ « /„. 



sification problems, as illustrated in Fig|4](a) |[T4l . In the first stage, we extracted features 
and estimated the basis matrices U (,!) for the concatenated tensor of all training data X w with 
labels and we built up a projection filter to extract the features of the test data (without la- 
bels). In the next step, we extracted features of test data using estimated common basis. The 
extracted features were then compared with the training features using standard classifier, 
e.g., LDA, KNN, SVM lfT4ll . We applied the procedure described above to various feature 
extraction classification problems Ifl4l . For example, we applied this procedure to success- 
fully classify three groups of human subjects: Control age matched subjects (CTRL), Mild 
Cognitive Impairment (MCI) and probable Alzheimer's disease (AD) patients (see Fig. 0] 
(b)) on the basis of spontaneous, followup EEG data Il28l [T4ll . All the training EEG data 
were organized in the form of 3-way tensors using a time-frequency representation by ap- 
plying Morlet wavelets. Each frontal slice represented preprocessed data of one subject. We 
applied the standard Tucker decomposition using HOOI and NTD algorithms to extract the 
basis matrices U (,1) (n = 1,2) and reduced features represented by a core tensor G. The 
new test data in the form of matrices X w were projected via the basis matrices to extract 
features. The extracted test features were compared with the training features and classifica- 
tion was performed using LDA, KNN and SVM. We obtained quite promising classification 
accuracies to predict AD ranging from 82% to 96% depending on EEG data sets. 
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Figure 4: (a) Conceptual diagram illustrating a classification procedure based on the Tucker 
decomposition of the concatenated tensor X = GXj U (1) x 2 U (2) ■ ■ ■ x N U (N) e M. I[Xlr " lNXK 
consisting of all sampling training data X w e ~Bj lXl2 "' xlN . Reduced features are obtained by 
projecting the testing data tensor X (r) onto the feature subspace spanned by factors (bases) 
U (,!) . The projection filter depends on the factors U (,,) . If they are orthogonal, we can apply 
a simple projection filter: G w = X (0 Xi U (l) T x 2 U (2) 1 ■ ■ ■ x N U w T (otherwise, see Eq. ([Hi 
(b) Application of the general procedure for the classification of Alzheimer's disease (AD), 
Mild Cognitive Impairment (MCI) patients and age matched control subjects (CTRL). 
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4 Linked Multiway BSS/ICA 



In neuroscience, we often need to perform a so called group analysis which seeks to identify 
mental tasks or stimuli driven brain patterns that are common in two or more subjects in a 
group. 

Various group ICA methods have been proposed to combine the results of multi-subjects, 
multi-tasks only after ICA is carried out on individual subjects and usually no constraints 
are imposed on components, which are allowed to differ with respect to their spatial, spectral 
maps, as well as to their temporal patterns. However, in many scenarios some links need 
to be considered to analyze variability and consistency of the components across subjects. 
Furthermore, some components do not need to be necessarily independent, they can be 
instead sparse, smooth or non-negative (e.g., for spectral components). Moreover, it is often 
necessary to impose some constraints to be able to estimate some components, which are 
identical or maximally correlated across subjects with regard to their spatial distributions, 
spectral or temporal patterns. This leads to a new concept and model of linked multiway 
BSS (or linked multiway ICA, if statistical independence criteria are used). 

In the linked multiway BSS, we perform approximative decompositions of a set of data 
tensors X (s) 6 m/ ixIt " xIn , (s = 1, 2, . . . , S) representing multiple subjects and/or multiple 
tasks (see Fig. |5]): 

X w = G w Xj U (U) x 2 U (2 ' 4) • • • x N U iN ' s) + E w , (17) 

where each factor (basis matrix) U ( "' v) = [U^f , Uy' ] e R. 7 » x7 " is composed of two bases: 
Ujn) £ R I„xR„ (wim <R n < which are common bases for all subjects in the group and 
correspond to the same or maximally correlated components and Up* 6 R I - X< ~ J "' R "\ which 
correspond to stimuli/tasks independent individual characteristics. 

For example, X w , (s = 1,2, .. .5) denotes the brain activity in space-time-frequency 
domains for the 5-th subject. Based on a set of such data, we can compute common factors 
and interactions between them. For example, linked multiway BSS approach my reveal 
coupling of brain regions, possibly at different time slots and/or different frequency bins. 

To solve problems formulated this way, we can apply similar procedures to the one de- 
scribed in previous two Sections. If U ( "" v) = e R 7 "* 7 " for a specific mode n, then we 
can concatenate all data tensors along this mode, perform tensor decomposition and apply 
any standard BSS algorithm to compute desired components in each mode (e.g., to estimate 
independent components, we apply any standard ICA algorithm). In the more general case, 
when the number of common components R n are unknown, we perform an additional un- 
folding for each data tensor X ( v) in each mode and then perform a set of constrained matrix 
factorizations (by applying standard algorithms for ICA, NMF, SCA, SmoCA, PCA/SVD, 
etc.): 

Xg = V M GgZ^ + E w = B ns Al s , (18) 

for n = 1, 2, . . . , N and s = 1, 2, . . . , S , where matrices B„ iS = U ( "' i} represent individual 
linked components, while matrices Aj s = G^Z^ represent basis vectors or linked mixing 
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Figure 5: Conceptual model of tensors decomposition for linked multiway BSS (espe- 
cially, Linked Multiway ICA). Objective is to find constrained factor matrices U ( ' 1i) = 



[U£\ Uf' s) ] 6 VL'" XJ ", n = 1,2,3) and core tensors G w 6 R**^, which are partially 
linked or maximally correlated, i.e., they have the same common components or highly 
correlated components. 



processes and Z ( ^ } = [u (AW ® ■ ■ ■ ® U (,,+U) <8> \] {n ~ U) •••<£> U (U) ] . 

In the next stage, we need to perform the statistical analysis and to compare the individ- 
ual components extracted in each mode n, for all subjects S , by performing clustering and 
ordering components to identify common or similar (highly correlated) components (see 
e.g., EE!)- In the last stage, we compute core tensors, which describe the functional links 
between components (see Eq. (PT21). 

For the linked multiway BSS, we can also exploit constrained Block Tensor Decompo- 
sition (BTD) models for an averaging data tensor across subjects: 

s 

X = (G w Xi U (U) x 2 U (2 " s) • • • x N U Wi) ) + E. (19) 

s=l 

Such model can provide us additional information. 

In group and linked multiway BSS and ICA, we usually seek stimuli driven ERPs (event 
related responses) and/or task related common components or common bases reflecting both 
intra subject and inter subject features as bases, which are independent involving individ- 
ual on going brain activities [29]. In other words, we seek event/task-related components 
Uf, that are identical in each mode or maximally correlated across subjects and event/task 
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independent individual bases U) , which are independent or even as far apart as possible 
(anti-correlated) Il29l . 

Note that our Linked Multiway BSS is different from the linked ICA [|23Tl and group 
NMF Il29ll30ll . since we do not limit components diversity by constraining them to be only 
statistically independent or only nonnegative components and our model is based on con- 
strained Tucker models instead of rather the quite restrictive trilinear CP model. 

How to select common components will depend on the validity of the underlying as- 
sumptions and a priori knowledge. For instance, identical spatial distributions can well be 
assumed for homogeneous subject groups, but may be unacceptable in studies that include 
patients with different ages or mental diseases or abnormalities. Temporal components may 
be the same for stimulus- or task-evoked responses that are related to a specific mental task 
or paradigm, but these will vary for the spontaneous fluctuations that occur in resting state 
experiments. In some experiments, responses may be assumed similar or identical within 
but different across subgroups [fT8llT7l . 

We conclude that the proposed Linked BSS model provides a framework that is very 
flexible and general and it may substantially supplement many of the currently available 
techniques for group ICA and feature extraction models. Moreover, the model can be ex- 
tended to a multiway Canonical Correlation Analysis (MCCA), in which we impose max- 
imal correlations among normalized factor matrices (or sub set of components) and/or core 
tensors. 

5 Multiway Partial Least Squares 

The Partial Least Squares (PLS) methods (originally developed in chemometrics and econo- 
metrics) are particularly suitable for the analysis of relationships among multi-modal brain 
data (e.g., EEG, MEG, ECoG (electrocorticogram), fMRI) or relationships between mea- 
sures of brain activity and behavior data [ST1 1321 . The standard PLS approaches have been 
recently summarized by H. Abdi [1311 and their suitability to model relationships between 
brain activity and behavior (experimental design) has been highlighted by A. Krishnan et al. 

In computational neuroscience, there are two related basic PLS methods: PLS correla- 
tion (PLSC), which analyzes correlations or associations between two or more sets of data 
(e.g., two modalities brain data or brain and behavior data) and PLS regression (PLSR) 
methods, which attempt to predict one set of data from another independent data that con- 
stitutes the predictors (e.g., experimental behavior data from brain data such multichannel 
ECoG or scalp EEG from ECoG, by performing simultaneous recordings for epileptic pa- 
tients). 

In order to predict response variables represented by matrix Y from the independent 
variables X, the standard PLSR techniques find a set of common orthogonal latent vari- 
ables (also called latent vectors, score vectors or components) by projecting both X and 
Y onto a new subspace, which ensures a maximal correlation between the latent vari- 
ables of X and Y OTTl . In other words, the prediction is achieved by simultaneous ap- 
proximative decompositions training data sets: X 6 R IxN and Y 6 R /xM into components 
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(A = [ai,a 2 , . . . , a 7 ] e R /x/ , B = [b ls b 2 , . . . , by] e R NxJ and C = [c u c 2 , . . . , Cj ] e R Mx7 ) : 

X = AB r = £a;bJ, (20) 

j=i 

j 

Y = ADC T = Y J dj j a j c T j , (21) 

j=i 

where D e R 7x/ is a scaling diagonal matrix; with the constraint that these components 
explain as much as possible of the covariance between X and Y lf3~Tll3~3~Tl . The latent compo- 
nents A are defined as A = XW, where W = [wi,W2, ■ ■ ■ , Wj] e R NxJ consists of J direction 
vectors Wj. The basic concept of the standard PLS is to find these directions vectors Wj from 
successive optimizations problems: 

Wj = argmax {corr 2 (Xw, Y) var(Xif)} , 
w 1 ' 

s.t. w T w = 1, w T X T Xw k = 

fovk= 1,2,. ..,7-1. 

Such decompositions and relationships between components can be used to predict val- 
ues of dependent variables for new situations, if the values of the corresponding independent 
variables are available. 

Since the brain data are often multidimensional and multi-modal and can be naturally 
represented via tensors, attempts have been made to extend PLS methods to multiway mod- 
els IT33TI . In this section, we briefly describe multiway PLS based on a constrained Tucker 
model (PTD): Given an Mh-order independent data tensor X e R /lX "' x/ " x ' " X ' N and an Mth- 
order dependent data tensor Y 6 R^' x " xKmX " xK m^ w it n the same size in at least one mode 
(typically, the first mode)Q I\ = K\ . 

Our objective is to preform simultaneous constrained Tucker decomposition, with at 
least one (or two) common or maximally correlated factor(s) (see Fig. [6]): 

X = G x x 1 U (1) x 2 U (2) x 3 ---x w U w + E x , (22) 
Y = G y x 1 A (1) x 2 A (2) x 3 ---x M A (M) +E y , (23) 

where additional constraints are imposed: U (1) = A (1) 6 R /lX/l (and U (2) = A (2) 6 R /2X72 of 
h = K 2 ), while other factor matrices are essentially different (e.g., orthogonal or mutually 
independent). Note that the core tensors G x and G y have special block-diagonal structures 
(see Fig. [6]) that indicate sparseness. New algorithms for constrained Tucker based multiway 
PLS models have been developed in OBI . 

Such models allow for different types of structures onX and Y and provides a general 
framework for solving multiway regression problems that explore complex relationships 

4 The first mode, usually, corresponds to the samples mode or the time mode. However, in some applica- 
tions, for example, for simultaneous recordings of EEG and ECoG data two modes (ways) may have the same 
size and they represent time and frequency (temporal and spectral modes, respectively). 
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K 2~h 

(K^K 2 ^K 3 ) (I^xjJ (J,x/ 2 x]; 3 ) (J 2 x/ 2 ) 

Figure 6: Illustration of a generalized Tucker-based Multiway PLS model for 3-way tensors 
in the case when two factor matrices are common (see Eqs. (l22l )-(l23T)'). Our objective is 
to perform approximative decompositions of independent tensor X e R /lX/2 "' xIn and depen- 
dent tensor Y e fl^i***-*** s wifa ^ = ^ an( j j 2 _ ^ j-,y imposing additional constraints, 
especially that the two factor matrices are the same or highly correlated: A (1) = U (1) and 
A (2) = U (2) , while the other factors are uncorrected (or anti-correlated) or statistically inde- 
pendent. We assumed also that the core tensors are sparse and block diagonal. 

between multidimensional dependent and independent variables. For example, tensor de- 
compositions can be applied in emerging neuroimaging genetics studies to investigate links 
between biological parameters measured with brain imaging and genetic variability [|34l . 

6 Conclusions 

Tensor decompositions is a fascinating emerging field of research, with many applications 
in multilinear blind source separation, linked multiway BSS/ICA, feature extraction, classi- 
fication and prediction. In this review/tutorial paper, we have briefly discussed several new 
and promising models for decompositions of linked or correlated sets of tensors, by impos- 
ing various constraints on factors, components or latent variables depending on the specific 
applications. We have developed several promising and efficient algorithms for such models, 
which can be found in our recent book, publications and reports. 
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